52 research outputs found

    Classification and transformation of dynamic dataflow programs

    Get PDF
    International audienceDataflow programming has been used to describe signal processing applications for many years, traditionally with cyclostatic dataflow (CSDF) or synchronous dataflow (SDF) models that restrict expressive power in favor of compile-time analysis and predictability. Dynamic dataflow is not restricted with respect to expressive power, but it does require runtime scheduling in the general case. Fortunately, most signal processing applications are far from being entirely dynamic, and parts with static behavior need not be dynamically scheduled. This paper presents a method to automatically analyze and classify blocks of a dynamic dataflow program within more restrictive dataflow models when possible, and to transform the blocks classified as static to improve execution speed by reducing the number of FIFO accesses. We used this method on actors of two dynamic dataflow descriptions of an MPEG-4 part 2 decoder, and study how classification and transformation increases decoding speed

    RVC-CAL dataflow implementations of MPEG AVC/H.264 CABAC decoding

    Get PDF
    International audienceThis paper describes the implementation of the MPEG AVC CABAC entropy decoder using the RVC-CAL dataflow programming language. CABAC is the Context based Adaptive Binary Arithmetic Coding entropy decoder that is used by the MPEG AVC/H.264 main and high profile video standard. CABAC algorithm provides increased compression efficiency, however presents a higher complexity compared to other entropy coding algorithms. This implementation of the CABAC entropy decoder using RVC-CAL proofs that complex algorithms can be implemented using a high level design language. This paper analyzes in detail two possible methods of implementing the CABAC entropy decoder in the dataflow paradigm

    HDS, a real-time multi-DSP motion estimator for MPEG-4 H.264 AVC high definition video encoding

    Get PDF
    International audienceH.264 AVC video compression standard achieves high compression rates at the cost of a high encoder complexity. The encoder performances are greatly linked to the motion estimation operation which requires high computation power and memory bandwidth. High definition context magnifies the difficulty of a real-time implementation. EPZS and HME are two well-known motion estimation algorithms. Both EPZS and HME are implemented in a DSP and their performances are compared in terms of both quality and complexity. Based on these results, a new algorithm called HDS for Hierarchical Diamond Search is proposed. HDS motion estimation is integrated in a AVC encoder to extract timings and resulting video qualities reached. A real-time DSP implementation of H.264 quarter-pixel accuracy motion estimation is proposed for SD and HD video format. Furthermore HDS characteristics make this algorithm well suited for H.264 SVC real-time encoding applications

    A List Scheduling Heuristic with New Node Priorities and Critical Child Technique for Task Scheduling with Communication Contention

    Get PDF
    International audienceTask scheduling is an important aspect for parallel programming. In this paper, the program to be scheduled is modeled as a Directed Acyclic Graph (DAG), and we target parallel embedded systems of multiple processors connected by buses and switches. This paper presents improvements for list scheduling heuristics with communication contention. We use new node priorities (top level and bottom level) to sort nodes and use an advanced technique of critical child to select a processor to execute a node. Experimental results show that our method is effective to reduce the schedule length, and the performance is greatly improved in the cases of medium and high communication. Since the communication cost is increasing from medium to high in modern applications like digital communication and video compression, our method will work well for scheduling these applications on parallel embedded systems

    Advanced list scheduling heuristic for task scheduling with communication contention for parallel embedded systems

    No full text
    WOSInternational audienceModern embedded systems tend to use multiple cores or processors for processing parallel applications. This paper indeed aims at task scheduling with communication contention for parallel embedded systems and proposes three advanced techniques to improve the list scheduling heuristic. Five groups of node levels (two existing groups and three new groups) are firstly used as node priorities to generate node lists. Then the critical child technique improves the selection of a processor in the scheduling process. Finally, the communication delay technique enlarges the idle time intervals on communication links. We also propose an advanced dynamic list scheduling heuristic by combining the three techniques. Experimental results show that the combined advanced dynamic heuristic is efficient to shorten the schedule length for most of the randomly generated DAGs in the cases of medium and high communication. Our method accelerates an application up to 80% in the case of high communication and can also reduce the use of hardware resources

    Reconfigurable video coding: a stream programming approach to the specification of new video coding standards

    Get PDF
    International audienceCurrent video coding standards, and their reference implementations, are architected as large monolithic and sequential algorithms, in spite of the considerable overlap of functionality between standards, and the fact that they are frequently implemented on highly parallel computing platforms. The former leads to unnecessary complexity in the standardization process, while the latter implies that implementations have to be rebuilt from the ground up to reflect the parallel nature of the target. The upcoming Reconfigurable Video Coding (RVC) standard currently developed at MPEG attempts to address these issues by building a framework that supports the construction of video standards as libraries of coding tools. These libraries can be incrementally updated and extended, and the tools in them can be aggregated to form complete codecs using a streaming (or dataflow) programming model, which preserves the inherent parallelism of the coding algorithm. This paper presents the RVC framework and its underlying data flow programming model, along with the tool support and initial results

    A List Scheduling Heuristic with New Node Priorities and Critical Child Technique for Task Scheduling with Communication Contention

    Get PDF
    International audienceTask scheduling is an important aspect for parallel programming. In this paper, the program to be scheduled is modeled as a Directed Acyclic Graph (DAG), and we target parallel embedded systems of multiple processors connected by buses and switches. This paper presents improvements for list scheduling heuristics with communication contention. We use new node priorities (top level and bottom level) to sort nodes and use an advanced technique of critical child to select a processor to execute a node. Experimental results show that our method is effective to reduce the schedule length, and the performance is greatly improved in the cases of medium and high communication. Since the communication cost is increasing from medium to high in modern applications like digital communication and video compression, our method will work well for scheduling these applications on parallel embedded systems

    Heuristique statique améliorée d'ordonnancement de tâches: impact sur le tri des tâches et sur l'allocation de processeur

    Get PDF
    National audienceL'ordonnancement de tâches est une étape importante dans le prototypage rapide d'applications de traitement d'images sur des systèmes parallèles embarqués. Nous présentons ainsi dans cet article une heuristique statique améliorée d'ordonnancement par liste : d'une part, cette heuristique intègre de nouvelles règles de priorité de tâches, tenant compte de la contention sur les communications entre tâches ; d'autre part, cette heuristique affine l'allocation d'un processeur à une tâche courante, en impactant le choix du processeur par un ordonnancement partiel de la tâche successeur critique (" critical child ") à la tâche courante. Nos résultats expérimentaux soulignent une accélération effective de l'application implantée, dans un contexte de moyenne comme de forte communication

    Heuristique statique améliorée d'ordonnancement de tâches: impact sur le tri des tâches et sur l'allocation de processeur

    Get PDF
    National audienceL'ordonnancement de tâches est une étape importante dans le prototypage rapide d'applications de traitement d'images sur des systèmes parallèles embarqués. Nous présentons ainsi dans cet article une heuristique statique améliorée d'ordonnancement par liste : d'une part, cette heuristique intègre de nouvelles règles de priorité de tâches, tenant compte de la contention sur les communications entre tâches ; d'autre part, cette heuristique affine l'allocation d'un processeur à une tâche courante, en impactant le choix du processeur par un ordonnancement partiel de la tâche successeur critique (" critical child ") à la tâche courante. Nos résultats expérimentaux soulignent une accélération effective de l'application implantée, dans un contexte de moyenne comme de forte communication

    Automated generation of an efficient MPEG-4 Reconfigurable Video Coding decoder implementation

    Get PDF
    International audienceThis paper proposes an automatic design flow from user-friendly design to efficient implementation of video processing systems. This design flow starts with the use of coarse-grain dataflow representations based on the CAL language, which is a complete language for dataflow programming of embedded systems. Our approach integrates previously developed techniques for detecting synchronous dataflow (SDF) regions within larger CAL networks, and exploiting the static structure of such regions using analysis tools in The Dataflow interchange format Package (TDP). Using a new XML format that we have developed to exchange dataflow information between different dataflow tools, we explore systematic implementation of signal processing systems using CAL, SDF-like region detection, TDP-based static scheduling, and CAL-to-C (CAL2C) translation. Our approach, which is a novel integration of three complementary dataflow tools -- the CAL parser, TDP, and CAL2C -- is demonstrated on an MPEG Reconfigurable Video Coding (RVC) decoder
    • …
    corecore